Multilingual Information Extraction with PolyglotIE

نویسندگان

Alan Akbik

Laura Chiticariu

Marina Danilevsky

Yonas Kbrom

Yunyao Li

Huaiyu Zhu

چکیده

We present POLYGLOTIE, a web-based tool for developing extractors that perform Information Extraction (IE) over multilingual data. Our tool has two core features: First, it allows users to develop extractors against a unified abstraction that is shared across a large set of natural languages. This means that an extractor needs only be created once for one language, but will then run on multilingual data without any additional effort or language-specific knowledge on part of the user. Second, it embeds this abstraction as a set of views within a declarative IE system, allowing users to quickly create extractors using a mature IE query language. We present POLYGLOTIE as a hands-on demo in which users can experiment with creating extractors, execute them on multilingual text and inspect extraction results. Using the UI, we discuss the challenges and potential of using unified, crosslingual semantic abstractions as basis for downstream applications. We demonstrate multilingual IE for 9 languages from 4 different language groups: English, German, French, Spanish, Japanese, Chinese, Arabic, Russian and Hindi.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern Multilingual and Cross-lingual Information Access Technologies

In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...

متن کامل

GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction

NLP infrastructures with comprehensive multilingual support can substantially decrease the overhead of developing Information Extraction (IE) systems in new languages by offering support for different character encodings, languageindependent components, and clean separation between linguistic data and the algorithms that use it. This paper will present GATE – a Unicode-aware infrastructure that...

متن کامل

Multilingual Extraction Ontologies

The growth of multilingual web content and increasing internationalization portends the need for cross-language query processing. We offer ML-OntoES (a MultiLingual Ontology-based Extraction System) as a solution for narrowdomain/data-rich applications. Based on language-independent extraction ontologies (Embley, Liddle, & Lonsdale, 2011), ML-OntoES enables semantic search over domain-specific,...

متن کامل

Multilingual Ontologies and English- Bulgarian Ontology Development

In this paper we make a short survey of the approaches for development of multilingual ontologies. Our main goal is to find appropriate approach for development of multilingual ontologies, including Bulgarian language terminology. We propose a collaborative methodology for development of English-Bulgarian bilingual ontologies by usage of information extraction from e-learning textual content, l...

متن کامل

Extraction of Multilingual Term Variants in the Business Reporting Domain

Within the context of the European research project ”Monnet”, which implements among other activities ontology-based multilingual information extraction, we tackle the the issue of recognizing variants of concept labels in business reports that guide the information extraction process. In this short paper, we describe two related experiments in finding variants of multilingual taxonomy labels u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Multilingual Information Extraction with PolyglotIE

نویسندگان

چکیده

منابع مشابه

Modern Multilingual and Cross-lingual Information Access Technologies

GATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction

Multilingual Extraction Ontologies

Multilingual Ontologies and English- Bulgarian Ontology Development

Extraction of Multilingual Term Variants in the Business Reporting Domain

عنوان ژورنال:

اشتراک گذاری